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I.  INTRODUCTION 


Consider  the  usual  regression  model 

where  ^ Is  an  n x 1 vector  of  observations,  £ • (6^,3^^,  • • . 1 3^)  ' 

Is  a (p  + 1)  X 1 vector  of  unknown  parameters,  £ Is  an  n x 1 vector 

of  random  errors,  and  X Is  a fixed  n x (p  + 1)  matrix.  It  will  be 

assumed  that  X Is  of  full  rank  and  that  the  first  column  of  X Is  a 

column  of  ones,  denoted  by  It  will  also  be  assumed  that  c has 

2 

expectation  Var  (£)  *0  1.,  and  c follows  a normal  distribution. 
The  ordinary  least  squares  (OLS)  estimator  of  ^ Is  given  by 

^ -1 

3*  (X’x)  (1-2) 

The  properties  of  £ are  well  known.  Namely, 

(1)  £ minimizes  - X3) ' - X3) 

(2)  ^ Is  the  best  linear  unbiased  estimator  of  £ 

/V 

(3)  £ Is  the  maxlmtim  likelihood  estimator  of  ^ 

(4)  3 ~ N(3,a^(X'X)'^) 

(5)  MSE(3)  - E((£  - i)'(3  - 3)]  * ah  i 

- - Af 

where  X, , X_,...,  X are  the  eigenvalues  of  X’X  . 

1 z p 

Upon  examination  of  the  last  of  these  properties.  It  Is  clear 
that  when  the  minimum  eigenvalue  Is  close  to  zero,  the  mean  squared 
error  (MSE)  becomes  unsatisfactorily  large.  This  led  Hoerl  and 
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Kennard  (1970a)  to  reconnnend  the  use  of  the  "ridge  estimator" 


1*  " (X'X  + k > 0 (1.3) 

when  X'X  , in  correlation  form.  Is  an  Ill-conditioned  matrix.  If 
k Is  treated  as  a constant,  the  following  properties  are  known: 

(1)  For  k - 0 , the  ridge  estimator  Is  Identical  to  the  OLS 
estimator. 

(2)  For  k > 0,  the  ridge  estimator  Is  shorter  than  the  OLS 
estimator. 

(3)  6*  ~ N(W^X*XB.  cf^W^’^)  where  - (X'X  + kl)"^. 

(4)  MSE(B*)  - ah\,/(X,  + k)^  + k^g'w  h . 

A.  /N 

(5)  There  always  exists  a k > 0 such  that  MSE(^*)  < MSE(^). 

Hoerl  and  Kennard  (1970b)  also  suggested  a graphical  technique 
to  determine  the  value  of  k.  In  this  technique,  one  first  constructs 
a plot  of  the  components  of  versus  k,  called  the  ridge  trace. 

Then  using  this  ridge  trace,  he  chooses  a value  of  k for  which  the 
estimates  have  "stabilized".  Analytic  methods  for  choosing  k have 
also  been  proposed.  For  example,  see  McDonald  and  Galarneau  (1975), 
or  Hoerl,  Kennard,  and  Baldwin  (1975).  In  any  event,  once  the  value 
of  k has  been  determined  in  some  manner,  (1.3)  gives  the  estimate  of 
to  be  reported. 

Research  papers  appearing  since  the  Hoerl  and  Kennard  articles 
have  not  always  used  this  estimator  as  their  ridge  estimator  because 


advocate  using  the  model  with  only  X standardized.  However,  they 
suggest  choosing  the  value  of  k with  ^ also  standardized.  Hemmerle 
(1975)  standardizes  the  design  matrix  X,  but  not  the  observation 
vector,  2.'  McDonald  and  Galarneau  (1975)  standardize  both  X and  2* 
but  Gullkey  and  Murphy  (1975)  make  no  standardizations  at  all. 
Obenchaln  (1975)  centers  both  X and  2 warns  against  scaling  or 
reparameterlzlng  unless  the  results  are  ultimately  to  be  reported 
In  that  form.  Thlsted  (1976)  gives  a good  discussion  of  the  problems 
Involved  with  the  standardization  of  the  variables. 

Comparison  of  the  results  of  one  Investigator  with  those  of 
another  may  be  hampered  by  the  lack  of  uniformity  In  the  form  of 
the  model.  Most  authors  have  recognized  this  problem,  but  have  not 
Investigated  It  further.  In  the  case  of  the  OLS  solution,  the  same 
estimator  Is  produced  from  any  of  the  forms  of  the  model.  Thus,  by 
solving  a problem  stated  In  any  form,  the  same  OLS  estimator  of  ^ will 
be  produced  when  measured  In  the  same  parameter  space.  This  nice 
property  Is  not  true  of  ridge  estimators,  however.  In  this  paper, 
formulae  are  given  for  different  forms  of  the  model,  and  comparisons 
are  made  among  them. 

In  order  to  examine  the  effect  of  standardization  on  the  ridge 
estimator,  first  consider  the  model.  If  no  transformations  are  made, 
the  model  Is  given  In  (1.1).  If  the  X matrix  and  ^ are  partitioned 
as  X - [1,:G]  and  ^ columns  of  G centered  about 

their  means,  the  model  can  be  written  as 


(1.4) 


i-  ^ + e 

00  - E[y]  - n“V£^ 

where  C is  the  symmetric  Idempotent  matrix  (jt  “ n ) * and 

y ■ n’^Zy^  . If  the  vectors  in  CG  are  also  scaled  to  have  unit 
length,  then  the  model  can  be  written  as 

Z ■ HX  + 1 

Bq  - E[y]  - n'^1'^  (1.5) 

where  H • CGD~^^^.  X “ diagonal  matrix  of 

scaling  factors.  In  the  case  of  the  OLS  estimator,  the  three 
forms  of  the  model,  (1.1),  (1.4),  and  (1.5),  are  known  to  give 
equivalent  estimators  of 
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II.  STANDARDIZATION  IN  RIDGE  ESTIMATION 


For  the  model  (1.1)  the  ridge  estimator  Is  given  by 

(2.1) 


E*  - (X'X  + k^I)"^X'i 


In  order  to  make  the  comparison  easier,  partition  X and  as 


before.  Then  can  be  written  as 


^01’  -^1 


£ - (6m.  b:/)' 


(2.2) 


where 


- (G'(I  - (n  + k^)"^J)G)“V(I  - (n  + k^)’^J)2. 


6*1  - (n/  (n  + k^)  ) (y  - n”H' Gg*^ 


and  J - 1 1'  . 


For  the  centered  model  (1.4),  the  ridge  estimator  is 


where 


/V  ^ Ak 


g*  _ (g*  g*  • \ I 

-2  ^ 02’  ^2  ' 


-1 


(2.3) 


6*  ■ y - "‘Vss; 


02 


^2 


In  general,  ttils  is  a distinct  estimator  from  that  given  in  (2.2), 

^ A A A 

If  4g1  ^ ^2  * I®  nothing  to  prove.  If  - g*  , then 
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8*1  - (n/(n  + ^i))6o2  ^ ^02  ““less  the  estimator 

is  the  OLS  estimator.  Hence,  these  estimators  are  not  the  same. 

For  the  third  form  of  the  model  (1.5)>  which  will  be  referred  to 
as  the  standardized  form  of  the  model,  the  estimator  of  the  original 


(2.4) 


where 


^3  - (G'rc  + 


The  estimator  (2.4)  can  be  seen  to  be  different  from  that  given  In 


(2.2)  In  the  same  way  a' 


was.  The  estimators  for  the  centered 


model  are  equivalent 


mators  for  the  standardized  model  only 


when  D Is  a scalar  muxtlpie  of  the  Identity.  In  this  case,  there 
do  exist  values  of  and  that  would  make  the  estimators  equivalent. 

To  this  point  In  the  development,  the  error  structure  has  remained 
the  same.  However,  If  transformations  are  made  on  the  observation 
vector,  the  error  structure  Is  also  changed.  It  Is  not  too  difficult 
to  show  that  If  the  design  matrix  Is  centered,  the  same  ridge  estimator 
Is  produced  by  using  ^ In  Its  original  form  or  In  Its  centered  form, 
even  If  the  equations  to  be  solved  do  not  take  Into  account  the  error 
structure.  It  can  also  be  shown  that  the  centering  and  scaling  of 
^ has  no  effect  on  the  estimator  If  X has  also  been  centered  and 
scaled. 


J 


In  order  to  evaluate  the  three  classes  of  estimators,  squared 
error  (squared  Euclidean  distance)  was  used  as  a basis  of  comparison 
In  this  study.  VRille  It  Is  possible  to  find  explicit  formulae  for 
the  MSE  (l.e.,  the  expected  squared  error)  of  each  estimator,  the 
distribution  of  the  squared  error  also  should  be  Investigated.  This 
led  to  a simulation  study. 

As  the  discussion  above  has  emphasized,  there  are  three 
parameter  spaces  In  which  such  evaluation  may  be  made.  Comparisons 
can  be  made  based  on  the  parameterization  of  (1)  the  original 
model,  (2)  the  centered  model,  or  (3)  the  standardized  model.  In 
this  study,  comparisons  were  made  In  all  three  parameter  spaces 
and  also  In  the  observation  (^)  space. 

For  this  simulation  study,  a design  discussed  In  Marquardt  and 
Snee  (1975)  was  used.  In  the  design  matrix  there  are  three  vectors 
orthogonal  to  _1.  Two  of  these  vectors  are  highly  correlated, 

(r  * 0.989),  and  the  other  correlations  are  zero.  This  design  matrix, 
which  has  eight  observations  and  four  parameters  to  be  estimated. 

Is  given  In  Figure  1. 

The  eigenvalues  and  eigenvectors  for  the  correlation  matrix 
were  calculated.  The  eigenvectors  (-1,  -1,  0)'  and  (-1,  1,  0)' 
corresponding  to  the  maximum  and  minimum  eigenvalues  respectively, 
were  multiplied  by  a and  3a,  where  a ^ 0.8,  the  same  standard 
deviation  value  used  by  Marquardt  and  Snee.  The  four  resulting 
vectors  were  transformed  from  the  standardized  space  back  to  the 
original  space,  and  the  constant  term  In  the  model  was  set  equal 
to  0.  Using  each  of  these  four  vectors  as  the  true  1000 
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replicates  of  2.  were  generated  from  model  (1.1)  using  a » 0.8 


Since  a and  ^ are  known,  the  value  of  k was  chosen  by  a com- 
puter routine  to  minimize  the  MSE  of  each  estimator,  using  formulae 

i 

similar  to  that  given  In  Hoerl  and  Kennard  (1970a).  Hence,  in  each 
case,  the  optimal  value  of  k was  used,  i.e.,  conditional  on  a and  ^ . 

At  this  point,  the  MSE  of  each  estimator  In  each  space  could  be 
calculated  theoretically,  but  doing  so  would  not  tell  very  much  about  j 

j 

the  distribution  of  the  squared  error  of  the  estimators.  Therefore,  j 

I 

I 

the  simulation  study  was  conducted.  In  this  simulation  the  true  | 

] 

value  of  3^  • (3  , ^) ' is  defined  such  that  ; 

0 G j 

(1)  3q  - 0 

(2)  ^ , when  transformed  to  the  standardized  space.  Is  equal 

G 

to  av,  where  a denotes  o or  3a,  and  v denotes  (-1,  -1,  0)', 
an  eigenvector  corresponding  to  the  maximum  eigenvalue, 
or  (-1,  1,  0)',  an  eigenvector  corresponding  to  the  min- 
imum eigenvalue. 

The  values  of  a and  v are  used  in  Figures  2 through  6 to  Indicate 
the  true  value  of  §_  used  In  the  simulation. 

Estimated  MSE  values  for  each  of  the  estimators  are  presented  in 
Figure  2.  Figures  3 through  6 show  how  the  estimators  compare  based 
on  the  simulations.  Each  entry  In  these  figures  Is  the  number  of 

times  out  of  1000  that  the  estimator  at  the  top  of  that  column  j 


Subject  to  computing  error  not  exceeding  1.0  x 10 
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Original  Centered  Standardized  Observation 


Figure  2:  Estimated  MSE  for  Each  of  the  Estimators 


Note  1:  The  true  value  of  the  parameter  when  transformed  to 

the  standardized  space,  is  equal  to  av.  (See  text  for 
full  discussion.) 


Note  2:  Each  entry  in  this  figure  is  the  number  of  times  out  of 

1000  replicates  that  the  estimator  at  the  top  of  the  column 
had  smaller  squared  error  than  the  estimator  at  the  left 
of  the  row. 


Figure  3:  Comparison  of  Estimates  of  ^ from  the  Simulation. 
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Note  i:  The  true  value  of  the  parameter  when  transformed  to 

the  standardized  space.  Is  equal  to  av.  (See  text  for 
full  discussion.) 


Note  2:  Each  entry  In  this  figure  Is  the  number  of  times  out  of 

1000  replicates  that  the  estimator  at  the  top  of  the  column 
had  smaller  squared  error  than  the  estimator  at  the  left 
of  the  row. 


Figure  4:  Comparison  of  Estimates  of  8^  from  the  Simulation. 


Note  1:  The  true  value  of  the  parameter  when  transformed  to 


the  standardized  space.  Is  equal  to  av  . (See  text  for 
full  discussion.) 


Note  2:  Each  entry  In  this  figure  is  the  number  of  times  out  of 

1000  replicates  that  the  estimator  at  the  top  of  the  column 
had  smaller  squared  error  than  the  estimator  at  the  left 
of  the  row. 


Figure  5:  Comparison  of  Estimates  of  ^ from  the  Simulation. 
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Note  1:  The  true  value  of  the  parameter  when  transformed  to  the 

standardized  space,  is  equal  to  av  . (See  text  for  full 
discussion. ) 


Note  2:  Each  entry  In  this  figure  Is  the  number  of  times  out  of 

1000  replicates  that  the  estimator  at  the  top  of  the  column 
had  smaller  squared  error  than  the  estimator  at  the  left 
of  the  row. 


Figure  6:  Comparison  of  Estimates  of  ^ from  the  Simulation 


had  smaller  squared  error  than  the  estimator  given  at  the  left  of 
that  row.  Figure  3 compares  the  estimators  In  the  original  space. 

Figure  4 In  the  centered  space.  Figure  5 In  the  standardized  space, 
and  Figure  6 In  the  observation  space. 

In  the  simulation  the  OLS  estimator  performed  as  expected. 

When  OLS  was  compared  to  the  ridge  estimators  separately,  each 

of  the  ridge  estimators  yielded  estimates  with  squared  error  smaller  | 

than  that  of  OLS  In  a minimum  of  57%  of  the  simulations  In  each  of  ; 

the  three  parameter  spaces  for  all  values  of  ^ considered.  Since 
the  OLS  estimator  Is  a special  case  of  all  three  ridge  estimators, 
and  the  ridge  estimators  were  chosen  with  optimal  properties,  this 
result  Is  not  at  all  surprising.  Also  as  expected  In  theory,  the 
OLS  estimates  of  ^ produced  the  smallest  squared  error  for  every  ^ 
generated. 

For  the  original  parameter  £ (Figure  3)  the  ridge  estimates 
from  model  (1.1),  when  compared  with  transformed  estimates  from 
either  of  the  other  two  models,  (1.4)  and  (1.5),  were  closest  to 
the  true  ^ In  more  than  69%  of  the  simulations  regardless  of  the 
magnitude  or  orientation  of  the  parameter  vector.  Since  these 
results  were  based  on  the  use  of  optimal  k values.  It  does  not 
necessarily  follow  that  similar  results  will  hold  for  nonoptlmal 
(stochastic)  k . Nonetheless,  when  a ridge  estimator  Is  to  be 


used  to  estimate  It  appears  that  the  estimator  ^ derived  from 
the  original  model  may  perform  the  best. 


r " ^ 

'r 

and  S It  appears  that  In  each  of  these  cases  It  might  be  well  to 
base  the  estimation  process  on  , i.e.,  by  using  ridge  estimation 
with  the  standardized  model. 

I 

[ 

An  estimate  of  ^ can  be  obtained  by  first  estimating  the 

I 

' ' parameter  ^ and  then  pre-multlplylng  the  estimate  by  X . As  can 

i 

I be  seen  from  Figure  6,  the  estimator  ^ based  on  the  centered  model 

appears  to  be  the  most  likely  candidate  of  the  three  ridge  estimators 
considered  when  comparisons  are  made  In  the  observation  space.  Of 
course,  since  the  OLS  estimator  minimizes  - XB) ' (y  - XB ) , none  of 
the  ridge  estimators  ever  resulted  In  a smaller  sqtiared  error  when 
measured  on  this  basis. 


III.  CONCLUSIONS 


The  research  described  in  this  technical  report  has  been 
concerned  with  three  ridge  estimators.  Since  each  was  chosen  to 
be  an  optimal  estimator  within  a particular  transformation  of  the 
parameter  space  with  respect  to  the  problem  being  considered,  the 
results  do  not  address  the  question  of  when  ridge  estimation  rather 
than  OLS  estimation  should  be  used.  Instead,  given  that  ridge 
regression  Is  to  be  used,  these  results  do  address  the  question  of 
which  ridge  estimator  to  use. 

vnille  the  results  of  this  small  scale  simulation  are  not 
conclusive,  they  do  Indicate  that  care  needs  to  be  taken  In  de- 
ciding which  estimator  to  use.  The  choice  of  estimator  will 
depend  on  the  criteria  used  to  measure  the  goodness  of  the 
estimation  process.  For  example,  a ridge  estimator  with  good 
squared  error  properties  In  the  standardized  parameter  space 
may  not  do  as  well  as  another  ridge  estimator  when  transformed 
back  to  the  original  parameter  space.  In  this  case,  the  data 
analyst  must  decide  whether  measurement  of  squared  error  Is  more 
appropriate  In  the  standardized  space  or  In  the  original  space. 
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This  paper  compares  ridge  estimators  for  ^ that  arise  when  the 
biasing  factor  (k)  Is  applied  at  different  stages  of  standardization 
(l.e.,  centering  and  scaling),  and  shows  which  estimators  are  Identical 
and  which  are  different.  In  addition,  results  of  a small-scale  simulation 


are  discussed. 
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